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Abstract 

Y  2 

An  algorithm  is  proposed  for  minimizing  certain  nice  C 

-  > 

functions  f  on  assuming  only  a  computational  knowledge  of 

i 

f  and  Vf.  It  is  shown  that  the  algorithm  provides  global 
convergence  at  a  rate  which  is  eventually  superlinear  and  possibly 
quadratic.  The  algorithm  is  purely  algebraic  and  does  not  require 
the  minimization  of  any  functions  of  one  variable. 

Numerical  computation  on  specific  problems  with  as  many  as 
six  independent  variables  has  shown  that  the  method  compares  very 
favorably  with  the  best  of  the  other  known  methods.  The  method  is 
compared  with  the  Fletcher  and  Powell  method  for  a  simple  two 
dimensional  test  problem  and  for  a  six  dimensional  problem  arising 
in  control  theory. 


This  note  proposes  an  algorithm  for  minimizing  certain  nice 

2 

C  functions  f  on  assuming  only  a  computational  knowledge 
of  f  and  Vf.  It  is  shown  that  the  algorithm  provides  global 
convergence  at  a  rate  which  is  eventually  superlinear  and  possibly 
quadratic.  The  algorithm  is  purely  algebraic  and  does  not  require 
the  minimization  of  any  functions  of  one  variable. 

In  the  following,  let  <5  and  r  be  positive  numbers  with 

6  <  Let  f  be  a  real-valued  function  defined  on  E  ,  x°  be 

n 

th 

an  arbitrary  point  of  E^,  and  1^  be  the  iu  column  of  the 

n  x  n  identity  matrix,  I.  Let  S  denote  the  level  set  of  f 

at  x°,  viz.:  S  «=  {x  e  E^:  f(x)  £f(x0)}.  Assume  that  for  some 

a  2  A 

open  convex  set  S  containing  S,  f  e  C  (S).  Let  H(x)  denote 

the  Hessian  of  f  at  x.  Assume  that  for  all  u  e  E  and  for  all 

n 

2 

x  e  S,  there  exists  a  constant  w  >  0  such  that  [u,H(x)u]  >_  <d||u||  . 

An  algorithm  for  minimizing  f(x)  consists  of  performing  the 

following  computations  for  k  ■  0,1,2,...: 

K  th 

1.  Compute  the  n  x  n  matrix  Q(x  )  whose  j  column  is 

vf(xk+eki1)-vf(xk) 

9k 

where  8q  ■  r  and 

9k  -  r||$(xk  1)  ||  for  k  ■  1,2,3,..., 


in  which  $  is  defined  as  follows: 


v 

(a)  If  k  -  0,  or  if  Q(x  )  is  singular,  or  if 

[7f(xk),Q  1(xk)Tf (xk)]  <^0, 
set  4>(xk)  *  Vf  (xk) . 

(b)  Otherwise,  set  i(xk)  *  Q  *(xk)Vf(xk). 

2.  Consider  the  function 


Y 


g(xk,Y) 


f Cxk)-f(xk-v»(xk))  ^ 
y  [  Vf  (xk) ,  (xk)  ] 


k  k 

If  g(x  ,1)  <  6,  choose  y^  so  that  5  <_  g(x  .y^)  <_  1-6; 

otherwise  set  y^  *  1. 

o  c  *.  k+1  k  , ,  kN 

3.  Set  x  ■  x  -  Yk<|>(x  ) . 

Theorem:  Under  the  assumptions  stated  above , 

1)  the  sequence  {cr^}  converges  to  a  point  z  minimizing  f, 

2)  there  exists  a  number  N  such  that  if  k  >  N  then  y^  =  1,  and 

3)  the  rate  of  convergence  of  is  superlinear. 


Before  proving  this  theorem  we  shall  first  establish  2  lemmas. 


Lemma  1.  If  the  sequence  {e^}  of  the  above  theorem  converges  to  0, 
then  {|!q (xk)  — H (xk)!}}  -*  0  and  for  some  K,  there  exists  a  positive  number 0 


k\,  iii,  1 12 


id'  such  that  for  all  k  ^  K  and  any  h  z  E^,  [h,Q(x  )h]  >_  uj * 


Proof .  The  existence  of  H(x)  implies  that  given  z  >  0  there 

h  1 1  <6,  we  have  the  validity 
of  the  inequality  1 1  Vf (x+h)-Vf (x)-H(x)h ||  <  e||h||.  For  large  k  we  have 


exists  6  >  0  such  that  for  all  h  z  E  , 

n 


Vj 


<6,  1  <  j  <  n,  and  therefore 


that 


-3- 


II 


7£(Xk+e  I  )-7f(xk>  . 

- gj - Hj  (x;  II  <  £,  (1  <  j  <  n),  where  (x  ) 


denotes  the  j1*1  column  of  H(x^) .  Thus  { ||  0(x^)-H(x^)  ||}  -»■  0. 

To  complete  the  proof  observe  that  because  is  bounded 

llhll2 

k  k 

below  and  Q(x  )  is  eventually  close  to  H(x  ),  it  follows  that 


[h.j.Q.(.x.„)-h.l  i s  also  eventually  bounded  below. 


Lemma  2.  Assume  A  and  B  are  square  matrices  satisfying 
(A-B)A  *||  <1.  Then  B  *  exists  and 


a^-b"1  II  <  IIa-b 


A"1  II2  •  Cl—  1 1  A-B  |[  ||A"1||)‘1. 


Proof .  Set  C  ■  B-A,  and  compute  that 


A^-B-1 


A"1-(A+C)"1 


a"1(i-[(a+c)a"1]‘1) 


A_1(I-[  I+CA-1]”1)  ||  . 


By  hypotheses  ||CA  *||  <1,  whence: 


[I+CA'1]"1  ■  I  -  CA_1  +  (CA-1)2  -  ...  -  AB_1, 


Thus  ||A"1-B"1||  <_  ||  A”1  ||  ||  (CA“1(I-CA_1+...)) 


1  1 1 C  1 1  1 1 A-1  1 1 2 (1- 1 1 CA”1  II)"1. 


We  now  turn  to  the  proof  of  the  theorem. 
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Proof  of  Theorem. 

We  show  first  that  the  set  S  is  bounded.  If  not,  there  is  an 

unbounded  sequence  say  {zk}  in  S.  Take  u  e  S.  Then  by  Taylor's 

theorem  and  the  fact  that  H(x)  is  bounded  below  by  lo  on  S  we  get 
f  (zk)  _>  f  (u)  +  II  ||  [  ( 1 1  zk-u  ||)  j  -  1 1  Vf  (u)  1 1  ] ,  showing  that 
f(z  )  >_  f(x8)  for  large  k;  thus  S  must  be  bounded. 

Clearly,  by  definition  of  $(xk),  Vf(xk)  t  0  implies 
k  Ic 

[Vf(x  ) , d> (x  )]  >  0.  Arguing  as  in  [1]  p.  148,  after  we  observe  that 
<P  is  bounded  on  S,  we  conclude  that  { [Vf  (xk)  ,4>(xk)  ] }  -*  0. 

Let  { x11 }  be  a  subsequence  of  |xk}  with  the  property  that 

{xn+1-xn}  -*  0.  By  Lemma  1,  [  Vf  (x11)  ,Q(xn)  Vf  (x11)  ]  ||  Vf  (xn)  1 1 2tiJ  *  for 

all  n  sufficiently  large.  Take  M  so  that  ||Q(xn)  ||  <_  M,  n  *  1,2,...  . 

Then  (Vf(xn),Q"1(xn)Vf(xn)]  -  [  Vf  (xn)  ,4>(xn)  ]  >  M~2W' 1 1  Vf  (x11)  1 1 2 ,  showing 

that  |vf(xn)}  -»■  0.  If  z  is  any  cluster  point  of  {x11},  clearly 

Vf(z)  *  0.  Because  f  (x)  -  f(z)  _>  \  1 1  x-z  1 1  u,  it  follows  that  z  is 

r  k  i 

the  unique  minimizer  of  f;  moreover,  since  |f(x  )}  is  strictly 
decreasing,  both  {f(xk)}  and  { f  (x11) }  converge  downward  to  f(z).  Thus 

if  z'  is  any  cluster  point  of  |xk},  f(z')  *  f(z),  which  implies  that 

f  k  i  k 

z'  =  z.  Consequently,  |x  }  -►  z.  This  implies  that  4>(x  )  ■*  0. 

We  now  turn  to  the  proof  of  2).  In  what  follows  the  superscripts 
k  on  x  will  often  be  omitted.  By  Taylor's  theorem  we  may  write: 


g (x, y)  -  1 


I » .UJ.hj.hJ 

2  [  Vf (x) ,h] 
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where  h  =  yQ  ^(xJVfCx),  and  £  lies  "between"  x  and  x  +  h. 
Set  H  (£)  ■  Q(x)  +  H  (O  -  Q(x),  then  we  calculate  that: 

g(x,y)  -  l  -  *  -  y[(H  (£)-Q(x))Q~1(x)Vf(x),Q~1(x)7f(x)1  . 

2[7f (x) ,Q_1(x)Vf (x) ] 

As  above  we  have  that 


[u,Q  1  (x) u ]  >_  w'M  2  1 1  u  1 1 2 


whence : 


_  X  _ 
2 


g(x,Y)|  £ 


Y  1 1  H  (Q-Q(x)  Mm' 


2oj 


,3 


Since  {||  £k-xk  1 1 }  *►  0,  we  have  by  uniform  continuity  on  S  that 
{ 1 1  H  (£k)-Q(xk)  ||}  -*■  0.  Thus  eventually  Yk  can  be  taken  always  to  be  unity. 

To  conclude  the  proof  we  write: 

k+1  k  -1 /  kN_r,  kN 

x  -  z  *  x  -  z  -  y^Q  (x  )Vf(x  ) 

-  xk  -  z  -  ykQ"1(xk)0(xk)(xk-z)  +  y^Q”1  (xk)  [  K  (xk) (xk-z)-Vf (xk) ] 

+  YkQ’1(xk)[Q(xk)-H(xk)](xk-z). 

Since  | H  (xk) (xk-z)-Vf (xk) |  <  e||xk-z||,  whenever  ||xk-z||  is 
sufficiently  small,  we  get  that 


k+1-z  II  £  |l-Ykl  1 1  xk-z  1 1  +  £  Yk  I  |  Q  1(xk)  II  xk-z  II 
+  Ykl|Q'1(xk)||  ||Q(xk)-H(xk)||  1 1 xk-*  1 1 . 
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Choose  k  sufficiently  large  so  that  ■  1  and  |j  Q(xk)  -  H(xk)  ||  <  e. 

Then  ||xk+*-z||  £  ~r  1 1 x^— z 1 1  ,  showing  the  superlinear  convergence. 

Remarks. 

k 

1.  Q(x  )  can  be  any  sequence  of  n  *  n  matrices  with  the 
property  that  jj  Q(xk)-H(xk)  ||  -*  0.  The  hypothesis  that  H(x) 
is  bounded  below  by  id  on  S  can  be  replaced  by  the  hypothesis 
that  S  is  bounded  and  f  has  a  unique  point  z  where  the 
gradient  vanishes  and  H(x)  is  bounded  below  by  id  on  some 
neighborhood  of  z.  Indeed  it  is  sufficient  that  H(x  )  be 
bounded  below  on  any  infinite  subsequence  of  |xk}. 

3 

2.  If  f  t  C  on  S  then  by  the  application  of  Kantorovich's  [2] 
theorem,  the  ultimate  rate  of  convergence  is  actually  quadratic. 


Numerical  Results 


The  method  of  this  paper  has  been  used  on  some  dozen  test 
problems.  For  comparison,  the  method  of  Fletcher  and  Powell  [3]  and 
in  some  cases  the  method  of  steepest  descent  have  also  been  tried  on 
the  same  problems.  In  all  cases  the  method  of  steepest  descent 
converged  very  much  more  slowly  than  the  other  two  methods. 

Table  1  shows  the  results  when  the  faster  two  methods  were  tried 
on  a  simple  test  problem  of  Fletcher  and  Powell  [3]  (originally  given 
by  Rosenbruck) 

f(xltx2)  ■  (x2-x^)2  +  .Ol(l-x^)2. 

The  number  of  steps  required  by  the  Fletcher  and  Powell  method  was 
about  the  same  as  for  the  method  of  this  paper.  It  is  hard  to  compare 
the  time  required  by  the  two  methods,  particularly  because  of  the  fact 
that  in  the  Fletcher  and  Powell  method,  it  is  easy  to  waste  a  lot  of 
time  obtaining  a  more  accurate  minimum  than  is  really  essential  in  the 
direction  the  method  specifies.  However,  the  method  of  Fletcher  and  Powell 
does  specify  that  the  function  be  minimized  in  this  direction,  and  one  needs 
to  have  some  reasonable  criteria  satisfied  for  the  approximate  minimum. 

The  "number  of  function  and  gradient  evaluations"  column  in  Table  1  is 
not  absolutely  accurate  because  the  number  of  function  evaluations 
required  is  not  exactly  the  same  as  the  number  of  gradient  evaluations 
required.  However,  the  numbers  in  t:ie  columns  are  approximately 
correct.  In  the  case  of  the  Fletcher  and  Powell  method,  if  a  less 
stringent  minimization  requirement  were  used,  it  is  possible  that  the 
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number  of  functional  evaluations  could  be  cut  somewhat. 

Figure  1  shows  the  results  when  all  three  methods  were  tried  on 
a  six  dimensional  problem  in  control  theory.  The  problem  consists  of 
minimizing  the  function  w(A)  where  X  is  a  6-dimensional  vector. 
The  vector  x(A)  is  defined  as 


where 


B(t) 


x 


B(T)G(X,t)dr 


(-4  sin  t  +  3t) 
(4  cos  r  -  3) 

2  (cos  t  -  1) 

2  sin  t 


0 

0 


2(l-cos  t) 
-2  sin  t 
-sin  t 
cos  T 
0 
0 


T  is  constant,  and  the  3-component  vector 


G(X,t) 


’  X-B(t) 

||  A*B(t)  i  ' 


if  ||  X*B(t)  ||  >  1 


[  0  if  1 1  A -B(t)  ||  <_  1. 


x°  is  a  fixed  6-dimensional  vector,  and  the  scalar  y 
Then 


w(A)  *  [A ,x(A)-x°]  -  y. 
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In  Figure  1  each  dot  represents  a  single  step  in  the  iterative 
process.  The  crosses  each  represent  ten  steps  in  the  method  of  steepest 
descent.  In  all  cases  the  same  final  results  were  obtained.  Notice 
that  the  Fletcher  and  Powell  method  starts  out  faster  than  the  present 
method  but  that  the  ultimate  convergence  is  more  rapid  for  this  new 
method.  Such  behavior  seems  to  occur  quite  often. 


C 


A 


>T« 


x(0)  x  (-.2,  .2,  .05,  .05,  .125,  -.02) 


Aguess“(-12’  *’69’  '-05'  *-5»  -03'  -47) 


w=  -  .45846190 


/  2. 8603 \ 

/  . 7792  \ 

-.1363 

.4144 

1-2.4161  / 
y  1.1614 J 

Control  on:  (  0,  .  1334  ) 

(  .  6349,  .9600  ) 


O  Fletcher  and  Powell 
x  Steepest  Descent 
•  Our  Method 


n  Evaluation 
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